9 research outputs found
Deep Exploration for Recommendation Systems
Modern recommendation systems ought to benefit by probing for and learning
from delayed feedback. Research has tended to focus on learning from a user's
response to a single recommendation. Such work, which leverages methods of
supervised and bandit learning, forgoes learning from the user's subsequent
behavior. Where past work has aimed to learn from subsequent behavior, there
has been a lack of effective methods for probing to elicit informative delayed
feedback. Effective exploration through probing for delayed feedback becomes
particularly challenging when rewards are sparse. To address this, we develop
deep exploration methods for recommendation systems. In particular, we
formulate recommendation as a sequential decision problem and demonstrate
benefits of deep exploration over single-step exploration. Our experiments are
carried out with high-fidelity industrial-grade simulators and establish large
improvements over existing algorithms
Scalable Neural Contextual Bandit for Recommender Systems
High-quality recommender systems ought to deliver both innovative and
relevant content through effective and exploratory interactions with users.
Yet, supervised learning-based neural networks, which form the backbone of many
existing recommender systems, only leverage recognized user interests, falling
short when it comes to efficiently uncovering unknown user preferences. While
there has been some progress with neural contextual bandit algorithms towards
enabling online exploration through neural networks, their onerous
computational demands hinder widespread adoption in real-world recommender
systems. In this work, we propose a scalable sample-efficient neural contextual
bandit algorithm for recommender systems. To do this, we design an epistemic
neural network architecture, Epistemic Neural Recommendation (ENR), that
enables Thompson sampling at a large scale. In two distinct large-scale
experiments with real-world tasks, ENR significantly boosts click-through rates
and user ratings by at least 9% and 6% respectively compared to
state-of-the-art neural contextual bandit algorithms. Furthermore, it achieves
equivalent performance with at least 29% fewer user interactions compared to
the best-performing baseline algorithm. Remarkably, while accomplishing these
improvements, ENR demands orders of magnitude fewer computational resources
than neural contextual bandit baseline algorithms
Optimism Based Exploration in Large-Scale Recommender Systems
Bandit learning algorithms have been an increasingly popular design choice
for recommender systems. Despite the strong interest in bandit learning from
the community, there remains multiple bottlenecks that prevent many bandit
learning approaches from productionalization. Two of the most important
bottlenecks are scaling to multi-task and A/B testing. Classic bandit
algorithms, especially those leveraging contextual information, often requires
reward for uncertainty estimation, which hinders their adoptions in multi-task
recommender systems. Moreover, different from supervised learning algorithms,
bandit learning algorithms emphasize greatly on the data collection process
through their explorative nature. Such explorative behavior induces unfair
evaluation for bandit learning agents in a classic A/B test setting. In this
work, we present a novel design of production bandit learning life-cycle for
recommender systems, along with a novel set of metrics to measure their
efficiency in user exploration. We show through large-scale production
recommender system experiments and in-depth analysis that our bandit agent
design improves personalization for the production recommender system and our
experiment design fairly evaluates the performance of bandit learning
algorithms
Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Auction-based recommender systems are prevalent in online advertising
platforms, but they are typically optimized to allocate recommendation slots
based on immediate expected return metrics, neglecting the downstream effects
of recommendations on user behavior. In this study, we employ reinforcement
learning to optimize for long-term return metrics in an auction-based
recommender system. Utilizing temporal difference learning, a fundamental
reinforcement learning algorithm, we implement an one-step policy improvement
approach that biases the system towards recommendations with higher long-term
user engagement metrics. This optimizes value over long horizons while
maintaining compatibility with the auction framework. Our approach is grounded
in dynamic programming ideas which show that our method provably improves upon
the existing auction-based base policy. Through an online A/B test conducted on
an auction-based recommender system which handles billions of impressions and
users daily, we empirically establish that our proposed method outperforms the
current production system in terms of long-term user engagement metrics
Dietary inflammatory potential mediated gut microbiota and metabolite alterations in Crohn's disease:A fire-new perspective
Background & aims: Pro-inflammatory diet interacting with gut microbiome might trigger for Crohn's disease (CD). We aimed to investigate the relationship between dietary inflammatory potential and microflora/metabolites change and their link with CD. Methods: The dietary inflammatory potential was assessed using a dietary inflammatory index (DII) based on the Food Frequency Questionnaire from 150 new-onset CD patients and 285 healthy controls (HCs). We selected 41 CD patients and 89 HCs who had not received medication for metagenomic and targeted metabolomic sequencing to profile their gut microbial composition as well as fecal and serum metabolites. DII scores were classified into quartiles to investigate associations among different variables. Results: DII scores of CD patients were significantly higher than HCs (0.56 ± 1.20 vs 0.23 ± 1.02, p = 0.017). With adjustment for confounders, a higher DII score was significantly associated with higher risk of CD (OR: 1.420; 95% CI: 1.049, 1.923, p = 0.023). DII score also was positively correlated with disease activity (p = 0.001). Morganella morganii and Veillonella parvula were increased while Coprococcus eutactus was decreased in the pro-inflammatory diets group, as well as in CD. DII-related bacteria were associated with disease activity and inflammatory markers in CD patients. Among the metabolic change, pro-inflammatory diet induced metabolites change were largely involved in amino acid metabolic pathways that were also observed in CD. Conclusions: Pro-inflammatory diet might be associated with increased risk and disease activity of CD. Diet with high DII potentially involves in CD by mediating alterations in gut microbiota and metabolites
Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning
This paper addresses the important need for advanced techniques in
continuously allocating workloads on shared infrastructures in data centers, a
problem arising due to the growing popularity and scale of cloud computing. It
particularly emphasizes the scarcity of research ensuring guaranteed capacity
in capacity reservations during large-scale failures. To tackle these issues,
the paper presents scalable solutions for resource management. It builds on the
prior establishment of capacity reservation in cluster management systems and
the two-level resource allocation problem addressed by the Resource Allowance
System (RAS). Recognizing the limitations of Mixed Integer Linear Programming
(MILP) for server assignment in a dynamic environment, this paper proposes the
use of Deep Reinforcement Learning (DRL), which has been successful in
achieving long-term optimal results for time-varying systems. A novel two-level
design that utilizes a DRL-based algorithm is introduced to solve optimal
server-to-reservation assignment, taking into account of fault tolerance,
server movement minimization, and network affinity requirements due to the
impracticality of directly applying DRL algorithms to large-scale instances
with millions of decision variables. The paper explores the interconnection of
these levels and the benefits of such an approach for achieving long-term
optimal results in the context of large-scale cloud systems. We further show in
the experiment section that our two-level DRL approach outperforms the MIP
solver and heuristic approaches and exhibits significantly reduced computation
time compared to the MIP solver. Specifically, our two-level DRL approach
performs 15% better than the MIP solver on minimizing the overall cost. Also,
it uses only 26 seconds to execute 30 rounds of decision making, while the MIP
solver needs nearly an hour